NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Generating Cross-model Analytics Workloads Using LLMs

https://doi.org/10.1145/3627673.3679932

Zheng, Xiuwen; Kumar, Arun; Gupta, Amarnath (October 2024, ACM)

Full Text Available
Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads

https://doi.org/10.14778/3636218.3636227

Nagrecha, Kabir; Kumar, Arun (December 2023, Proceedings of the VLDB Endowment)

Large models such as GPT-3 and ChatGPT have transformed deep learning (DL), powering applications that have captured the public's imagination. Such models must be trained on multiple GPUs due to their size and computational load, driving the development of a bevy of model parallelism techniques and tools. Navigating suchparallelismchoices, however, is a new burden for DL users such as data scientists, domain scientists, etc., who may lack the necessary systems knowhow. The need formodel selection, which leads to many models to train due to hyper-parameter tuning or layer-wise finetuning, compounds the situation with two more burdens:resource apportioningandscheduling.In this work, we unify these three burdens by formalizing them as a joint problem that we call SPASE: Select a Parallelism, Allocate resources, and Schedule. We propose a new information system architecture to tackle the SPASE problem holistically, exploiting the performance opportunities presented by joint optimization. We devise an extensible template for existing parallelism schemes and combine it with an automated empirical profiler for runtime estimation. We then formulate SPASE as an MILP. We find that direct use of an MILP-solver is significantly more effective than several baseline heuristics. We optimize the system runtime further with an introspective scheduling approach. We implement all these techniques into a new data system we call Saturn. Experiments with benchmark DL workloads show that Saturn achieves 39-49% lower model selection runtimes than current DL practice.
more » « less
Full Text Available
Preliminary Validity and Acceptability of Motion Tape for Measuring Low Back Movement: Mixed Methods Study

https://doi.org/10.2196/57953

Lee, Audrey; Wyckoff, Elijah; Farcas, Emilia; Godino, Job; Patrick, Kevin; Spiegel, Spencer; Yu, Rose; Kumar, Arun; Loh, Kenneth J; Gombatto, Sara (August 2024, JMIR Rehabilitation and Assistive Technologies)

BackgroundLow back pain (LBP) is a significant public health problem that can result in physical disability and financial burden for the individual and society. Physical therapy is effective for managing LBP and includes evaluation of posture and movement, interventions directed at modifying posture and movement, and prescription of exercises. However, physical therapists have limited tools for objective evaluation of low back posture and movement and monitoring of exercises, and this evaluation is limited to the time frame of a clinical encounter. There is a need for a valid tool that can be used to evaluate low back posture and movement and monitor exercises outside the clinic. To address this need, a fabric-based, wearable sensor, Motion Tape (MT), was developed and adapted for a low back use case. MT is a low-profile, disposable, self-adhesive, skin-strain sensor developed by spray coating piezoresistive graphene nanocomposites directly onto commercial kinesiology tape. ObjectiveThe objectives of this study were to (1) validate MT for measuring low back posture and movement and (2) assess the acceptability of MT for users. MethodsA total of 10 participants without LBP were tested. A 3D optical motion capture system was used as a reference standard to measure low back kinematics. Retroreflective markers and a matrix of MTs were placed on the low back to measure kinematics (motion capture) and strain (MT) simultaneously during low back movements in the sagittal, frontal, and axial planes. Cross-correlation coefficients were calculated to evaluate the concurrent validity of MT strain in reference motion capture kinematics during each movement. The acceptability of MT was assessed using semistructured interviews conducted with each participant after laboratory testing. Interview data were analyzed using rapid qualitative analysis to identify themes and subthemes of user acceptability. ResultsVisual inspection of concurrent MT strain and kinematics of the low back indicated that MT can distinguish between different movement directions. Cross-correlation coefficients between MT strain and motion capture kinematics ranged from –0.915 to 0.983, and the strength of the correlations varied across MT placements and low back movement directions. Regarding user acceptability, participants expressed enthusiasm toward MT and believed that it would be helpful for remote interventions for LBP but provided suggestions for improvement. ConclusionsMT was able to distinguish between different low back movements, and most MTs demonstrated moderate to high correlation with motion capture kinematics. This preliminary laboratory validation of MT provides a basis for future device improvements, which will also involve testing in a free-living environment. Overall, users found MT acceptable for use in physical therapy for managing LBP.
more » « less
Full Text Available
Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines

https://doi.org/10.14778/3611479.3611483

Zhang, Yuhao; Kumar, Arun (July 2023, Proceedings of the VLDB Endowment)

Recent advances in Graph Neural Networks (GNNs) have changed the landscape of modern graph analytics. The complexity of GNN training and the scalability challenges have also sparked interest from the systems community, with efforts to build systems that provide higher efficiency and schemes to reduce costs. However, we observe that many such systems basically reinvent the wheel of much work done in the database world on scalable graph analytics engines. Further, they often tightly couple the scalability treatments of graph data processing with that of GNN training, resulting in entangled complex problems and systems that often do not scale well on one of those axes. In this paper, we ask a fundamental question: How far can we push existing systems for scalable graph analytics and deep learning (DL) instead of building custom GNN systems? Are compromises inevitable on scalability and/or runtimes? We propose Lotan, the first scalable and optimized data system for full-batch GNN training withdecoupled scalingthat bridges the hitherto siloed worlds of graph analytics systems and DL systems. Lotan offers a series of technical innovations, including re-imagining GNN training as query plan-like dataflows, execution plan rewriting, optimized data movement between systems, a GNN-centric graph partitioning scheme, and the first known GNN model batching scheme. We prototyped Lotan on top of GraphX and PyTorch. An empirical evaluation using several real-world benchmark GNN workloads reveals a promising nuanced picture: Lotan significantly surpasses the scalability of state-of-the-art custom GNN systems, while often matching or being only slightly behind on time-to-accuracy metrics in some cases. We also show the impact of our system optimizations. Overall, our work shows that the GNN world can indeed benefit from building on top of scalable graph analytics engines. Lotan's new level of scalability can also empower new ML-oriented research on ever-larger graphs and GNNs.
more » « less
Full Text Available
Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets

https://doi.org/10.1145/3514221.3517846

Nakandala, Supun; Kumar, Arun (June 2022, Proceedings of the 2022 International Conference on Management of Data)

Full Text Available
Towards an optimized GROUP by abstraction for large-scale machine learning

https://doi.org/10.14778/3476249.3476284

Li, Side; Kumar, Arun (July 2021, Proceedings of the VLDB Endowment)

Many applications that use large-scale machine learning (ML) increasingly prefer different models for subgroups (e.g., countries) to improve accuracy, fairness, or other desiderata. We call this emerging popular practice learning over groups , analogizing to GROUP BY in SQL, albeit for ML training instead of SQL aggregates. From the systems standpoint, this practice compounds the already data-intensive workload of ML model selection (e.g., hyperparameter tuning). Often, thousands of models may need to be trained, necessitating high-throughput parallel execution. Alas, most ML systems today focus on training one model at a time or at best, parallelizing hyperparameter tuning. This status quo leads to resource wastage, low throughput, and high runtimes. In this work, we take the first step towards enabling and optimizing learning over groups from the data systems standpoint for three popular classes of ML: linear models, neural networks, and gradient-boosted decision trees. Analytically and empirically, we compare standard approaches to execute this workload today: task-parallelism and data-parallelism. We find neither is universally dominant. We put forth a novel hybrid approach we call grouped learning that avoids redundancy in communications and I/O using a novel form of parallel gradient descent we call Gradient Accumulation Parallelism (GAP). We prototype our ideas into a system we call Kingpin built on top of existing ML tools and the flexible massively-parallel runtime Ray. An extensive empirical evaluation on large ML benchmark datasets shows that Kingpin matches or is 4x to 14x faster than state-of-the-art ML systems, including Ray's native execution and PyTorch DDP.
more » « less
Full Text Available
Intermittent human-in-the-loop model selection using cerebro: a demonstration

https://doi.org/10.14778/3476311.3476320

Li, Liangde; Nakandala, Supun; Kumar, Arun (July 2021, Proceedings of the VLDB Endowment)

Deep learning (DL) is revolutionizing many fields. However, there is a major bottleneck for the wide adoption of DL: the pain of model selection , which requires exploring a large config space of model architecture and training hyper-parameters before picking the best model. The two existing popular paradigms for exploring this config space pose a false dichotomy. AutoML-based model selection explores configs with high-throughput but uses human intuition minimally. Alternatively, interactive human-in-the-loop model selection completely relies on human intuition to explore the config space but often has very low throughput. To mitigate the above drawbacks, we propose a new paradigm for model selection that we call intermittent human-in-the-loop model selection . In this demonstration, we will showcase our approach using five real-world DL model selection workloads. A short video of our demonstration can be found here: https://youtu.be/K3THQy5McXc.
more » « less
Full Text Available
Towards Benchmarking Feature Type Inference for AutoML Platforms

https://doi.org/10.1145/3448016.3457274

Shah, Vraj; Lacanlale, Jonathan; Kumar, Premanand; Yang, Kevin; Kumar, Arun (June 2021, Proceedings of the 2021 International Conference on Management of Data)

Full Text Available
Incremental and Approximate Computations for Accelerating Deep CNN Inference

https://doi.org/10.1145/3397461

Nakandala, Supun; Nagrecha, Kabir; Kumar, Arun; Papakonstantinou, Yannis (December 2020, ACM Transactions on Database Systems)
null (Ed.)
Deep learning now offers state-of-the-art accuracy for many prediction tasks. A form of deep learning called deep convolutional neural networks (CNNs) are especially popular on image, video, and time series data. Due to its high computational cost, CNN inference is often a bottleneck in analytics tasks on such data. Thus, a lot of work in the computer architecture, systems, and compilers communities study how to make CNN inference faster. In this work, we show that by elevating the abstraction level and re-imagining CNN inference as queries , we can bring to bear database-style query optimization techniques to improve CNN inference efficiency. We focus on tasks that perform CNN inference repeatedly on inputs that are only slightly different . We identify two popular CNN tasks with this behavior: occlusion-based explanations (OBE) and object recognition in videos (ORV). OBE is a popular method for “explaining” CNN predictions. It outputs a heatmap over the input to show which regions (e.g., image pixels) mattered most for a given prediction. It leads to many re-inference requests on locally modified inputs. ORV uses CNNs to identify and track objects across video frames. It also leads to many re-inference requests. We cast such tasks in a unified manner as a novel instance of the incremental view maintenance problem and create a comprehensive algebraic framework for incremental CNN inference that reduces computational costs. We produce materialized views of features produced inside a CNN and connect them with a novel multi-query optimization scheme for CNN re-inference. Finally, we also devise novel OBE-specific and ORV-specific approximate inference optimizations exploiting their semantics. We prototype our ideas in Python to create a tool called Krypton that supports both CPUs and GPUs. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5× (respectively, 35×) to produce exact (respectively, high-quality approximate) results without raising resource requirements.
more » « less
Full Text Available
Query Optimization for Faster Deep CNN Explanations

https://doi.org/10.1145/3422648.3422663

Nakandala, Supun; Kumar, Arun; Papakonstantinou, Yannis (September 2020, ACM SIGMOD Record)
null (Ed.)
Deep Convolutional Neural Networks (CNNs) now match human accuracy in many image prediction tasks, resulting in a growing adoption in e-commerce, radiology, and other domains. Naturally, "explaining" CNN predictions is a key concern for many users. Since the internal workings of CNNs are unintuitive for most users, occlusion-based explanations (OBE) are popular for understanding which parts of an image matter most for a prediction. One occludes a region of the image using a patch and moves it around to produce a heatmap of changes to the prediction probability. This approach is computationally expensive due to the large number of re-inference requests produced, which wastes time and raises resource costs. We tackle this issue by casting the OBE task as a new instance of the classical incremental view maintenance problem. We create a novel and comprehensive algebraic framework for incremental CNN inference combining materialized views with multi-query optimization to reduce computational costs. We then present two novel approximate inference optimizations that exploit the semantics of CNNs and the OBE task to further reduce runtimes. We prototype our ideas in a tool we call Krypton. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5x (resp. 35x) to produce exact (resp. high-quality approximate) results without raising resource requirements.
more » « less
Full Text Available

« Prev Next »

Search for: All records